Medical Visual Question Answering (Medical-VQA) aims to answer clinical questions regarding radiology images, assisting doctors with decision-making options. Nevertheless, current Medical-VQA models learn cross-modal representations through residing vision and texture encoders in dual separate spaces, which lead to indirect semantic alignment. In this paper, we propose UnICLAM, a Unified and Interpretable Medical-VQA model through Contrastive Representation Learning with Adversarial Masking. Specifically, to learn an aligned image-text representation, we first establish a unified dual-stream pre-training structure with the gradually soft-parameter sharing strategy. Technically, the proposed strategy learns a constraint for the vision and texture encoders to be close in a same space, which is gradually loosened as the higher number of layers. Moreover, for grasping the semantic representation, we extend the unified Adversarial Masking data augmentation strategy to the contrastive representation learning of vision and text in a unified manner, alleviating the meaningless of the commonly used random mask. Concretely, while the encoder training minimizes the distance between the original feature and the masking feature, the adversarial masking model keeps adversarial learning to conversely maximize the distance. Furthermore, we also intuitively take a further exploration of the unified adversarial masking strategy, which improves the potential ante-hoc interpretability with remarkable performance and efficiency. Experimental results on VQA-RAD and SLAKE public benchmarks demonstrate that UnICLAM outperforms the existing 11 state-of-the-art Medical-VQA models. More importantly, we make an additional discussion about the performance of UnICLAM in diagnosing heart failure, verifying that UnICLAM exhibits superior few-shot adaption performance in practical disease diagnosis.
translated by 谷歌翻译
The ever-growing deep learning technologies are making revolutionary changes for modern life. However, conventional computing architectures are designed to process sequential and digital programs, being extremely burdened with performing massive parallel and adaptive deep learning applications. Photonic integrated circuits provide an efficient approach to mitigate bandwidth limitations and power-wall brought by its electronic counterparts, showing great potential in ultrafast and energy-free high-performance computing. Here, we propose an optical computing architecture enabled by on-chip diffraction to implement convolutional acceleration, termed optical convolution unit (OCU). We demonstrate that any real-valued convolution kernels can be exploited by OCU with a prominent computational throughput boosting via the concept of structral re-parameterization. With OCU as the fundamental unit, we build an optical convolutional neural network (oCNN) to implement two popular deep learning tasks: classification and regression. For classification, Fashion-MNIST and CIFAR-4 datasets are tested with accuracy of 91.63% and 86.25%, respectively. For regression, we build an optical denoising convolutional neural network (oDnCNN) to handle Gaussian noise in gray scale images with noise level {\sigma} = 10, 15, 20, resulting clean images with average PSNR of 31.70dB, 29.39dB and 27.72dB, respectively. The proposed OCU presents remarkable performance of low energy consumption and high information density due to its fully passive nature and compact footprint, providing a highly parallel while lightweight solution for future computing architecture to handle high dimensional tensors in deep learning.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
非视线(NLOS)成像是一种用于检测障碍物或角落周围物体的物体的新兴技术。关于被动NLOS的最新研究主要集中在稳态测量和重建方法上,这些方法显示出识别移动目标的局限性。据我们所知,我们提出了一种新颖的基于事件的无源NLOS成像方法。我们获得了基于事件的异步数据,其中包含NLOS目标的详细动态信息,并有效缓解由运动引起的斑点降解。此外,我们创建了第一个基于事件的NLOS成像数据集NLOS-ES,并且由时间表面表示提取基于事件的功能。我们通过基于事件的数据与基于框架的数据比较重建。基于事件的方法在PSNR和LPIP上表现良好,该方法比基于框架的方法好20%和10%,而数据量仅占传统方法的2%。
translated by 谷歌翻译
改变布料的人重新识别(REID)是一个新出现的研究主题,旨在检索换衣服的行人。由于带有不同衣服的人类外观表现出较大的变化,因此现有方法很难提取歧视性和健壮的特征表示。当前的作品主要集中在身体形状或轮廓草图上,但是人类的语义信息以及换衣服之前和之后的行人特征的潜在一致性未被充分探索或被忽略。为了解决这些问题,在这项工作中,提出了一种新颖的语义意识到的注意力和视觉屏蔽网络,用于换衣服的人Reid(缩写为SAV),其中关键的想法是屏蔽与衣服外观相关的线索,只关注衣服的外观对视图/姿势变化不敏感的视觉语义信息。具体而言,首先采用了视觉语义编码器来基于人类语义分割信息来定位人体和服装区域。然后,提出了人类的语义注意模块(HSA),以突出显示人类的语义信息并重新授予视觉特征图。此外,视觉服装屏蔽模块(VCS)还旨在通过覆盖衣服区域并将模型集中在与衣服无关的视觉语义信息上来提取更健壮的特征代表。最重要的是,这两个模块在端到端统一框架中共同探索。广泛的实验表明,所提出的方法可以显着胜过最先进的方法,并且可以为换衣的人提取更健壮的特征。与FSAM(在CVPR 2021中发布)相比,该方法可以分别在LTCC和PRCC数据集上以MAP(RANK-1)的形式获得32.7%(16.5%)和14.9%( - )。
translated by 谷歌翻译
3D肺部片段的重建在肺癌的外科治疗计划中起着重要作用,这有助于保存肺功能并有助于确保低复发率。但是,在深度学习时代,肺部段的自动重建仍未得到探索。在本文中,我们研究了是什么使肺部段自动重建。首先,我们在临床和几何上表达了肺部段的解剖学定义,并提出了遵守这些定义的评估指标。其次,我们提出了脉冲(隐式肺部段),这是一种旨在肺部段重建的深层隐式表面模型。通过脉冲自动重建肺部段的指标和视觉吸引力是准确的。与规范分割方法相比,冲动输出连续预测任意分辨率具有较高的训练效率和更少的参数。最后,我们尝试不同的网络输入,以分析肺部段重建任务中重要的事情。我们的代码可在https://github.com/m3dv/impulse上找到。
translated by 谷歌翻译
It has been witnessed that learned image compression has outperformed conventional image coding techniques and tends to be practical in industrial applications. One of the most critical issues that need to be considered is the non-deterministic calculation, which makes the probability prediction cross-platform inconsistent and frustrates successful decoding. We propose to solve this problem by introducing well-developed post-training quantization and making the model inference integer-arithmetic-only, which is much simpler than presently existing training and fine-tuning based approaches yet still keeps the superior rate-distortion performance of learned image compression. Based on that, we further improve the discretization of the entropy parameters and extend the deterministic inference to fit Gaussian mixture models. With our proposed methods, the current state-of-the-art image compression models can infer in a cross-platform consistent manner, which makes the further development and practice of learned image compression more promising.
translated by 谷歌翻译
深度学习在加速磁共振成像(MRI)中表现出惊人的性能。最先进的深度学习重建采用强大的卷积神经网络,并且由于许多磁共振图像或其对应的k空间是2D的许多磁共振图像或其对应的k空间。在这项工作中,我们展示了一种探讨了1D卷积的新方法,使得深度网络更容易受到培训和广义。我们进一步将1D卷积集成到所提出的深网络中,命名为一维深度低级和稀疏网络(ODL),它展开了低级和稀疏重建模型的迭代过程。在体内膝盖和脑数据集中的广泛结果表明,所提出的ODLS非常适合培训受试者的情况,并提供比视觉和定量的最先进的方法改进的重建性能。此外,ODL还向不同的欠采样场景显示出良好的稳健性以及培训和测试数据之间的一些不匹配。总之,我们的工作表明,在快速MRI中,1D深度学习方案是内存高效且强大的。
translated by 谷歌翻译
相同地形的不同卫星图像的相对辐射归一化(RRN)对于改变检测,对象分类/分割和映射任务是必要的。但是,传统的RRN模型不强大,通过对象变化扰乱,并且RRN模型精确考虑对象变化无法鲁布布地获取无更改集。本文提出了通过潜在变化噪声建模的自动稳健的相对辐射归一化方法。它们利用先验知识,即在相对辐射尺度化下没有变化点具有小尺度噪声,并且在辐射归一化之后,变化点具有大规模的辐射噪声,组合随机期望最大化方法快速且强大地提取No-Change集以学习相对辐射归一化映射映射函数。这使我们的模型在理论上就是关于概率理论和数学扣除的基础。具体地,当我们选择直方图匹配作为与高斯噪声(HM-RRN-RRN-RRN-MOG)混合的相对辐射算法学习方案(HM-RRN-MOG)的相对辐射归一化学习方案,HM-RRN-MOG模型实现了最佳性能。我们的模型具有强大地反对云/雾气/变化的能力。我们的方法自然地为RRN生成一个强大的评估指示器,即No-Change Set Totor Square error。我们将HM-RRN-MOG模型应用于后一种植被/水变化检测任务,这减少了无辐射对比度和NDVI / NDWI对无变化集的差异,产生了一致和可比的结果。我们利用No-Change集合到建筑物变更检测任务中,有效地减少了伪变化并提高了精度。
translated by 谷歌翻译
In this work, we propose a novel image reconstruction framework that directly learns a neural implicit representation in k-space for ECG-triggered non-Cartesian Cardiac Magnetic Resonance Imaging (CMR). While existing methods bin acquired data from neighboring time points to reconstruct one phase of the cardiac motion, our framework allows for a continuous, binning-free, and subject-specific k-space representation.We assign a unique coordinate that consists of time, coil index, and frequency domain location to each sampled k-space point. We then learn the subject-specific mapping from these unique coordinates to k-space intensities using a multi-layer perceptron with frequency domain regularization. During inference, we obtain a complete k-space for Cartesian coordinates and an arbitrary temporal resolution. A simple inverse Fourier transform recovers the image, eliminating the need for density compensation and costly non-uniform Fourier transforms for non-Cartesian data. This novel imaging framework was tested on 42 radially sampled datasets from 6 subjects. The proposed method outperforms other techniques qualitatively and quantitatively using data from four and one heartbeat(s) and 30 cardiac phases. Our results for one heartbeat reconstruction of 50 cardiac phases show improved artifact removal and spatio-temporal resolution, leveraging the potential for real-time CMR.
translated by 谷歌翻译